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Abstract 

Nostoc punctiforme is a filamentous cyanobacterium with extensive phenotypic characteristics and a relatively 
large genome, approaching J 0 Mb. The phenotypic characteristics include a photoautotrophic, diazotrophic mode 
f~" of growth, but N. punctiforme is also facultatively heterotrophic; its vegetative cells have multiple developmental 
■ 1 1 alternatives, including terminal differentiation into nitrogen-fixing heterocysts and transient differentiation into 
spore-like akinetes or motile filaments called hormogonia; and N. punctiforme has broad symbiotic competence 
with fungi and terrestrial plants, including bryophytes, gymnosperms and an angiosperm. The shotgun-sequencing 
-™ phase of the N. punctiforme strain ATCC 29133 genome has been completed by the Joint Genome Institute. 
— ^ Annotation of an 8.9 Mb database yielded 7432 open reading frames, 45% of which encode proteins with known 
or probable known function and 29% of which are unique to N. punctiforme. Comparative analysis of the sequence 
indicates a genome that is highly plastic and in a state of flux, with numerous insertion sequences and rnultilocus 
repeats, as well as genes encoding transposases and DNA modification enzymes. The sequence also reveals the 
presence of genes encoding putative proteins that collectively define almost all characteristics of cyanobacteria as 
a group. N, punctiforme has an extensive potential to sense and respond to environmental signals as reflected by 
the presence of more than 400 genes encoding sensor protein kinases, response regulators and other transcriptional 
factors. The signal transduction systems and any of the large number of unique genes may play essential roles in 
the cell differentiation and symbiotic interaction properties of N. punctiforme. 



Introduction: Physiological capabilities of Nostoc 
punctiforme 

Nostoc punctiforme is a nitrogen fixing cyanobac- 
terium, found predominantly in terrestrial habitats. N. 
punctiforme displays an extraordinarily wide range of 
vegetative cell developmental alternatives, physiolo- 
gical properties and ecological niches, including sym- 



biotic associations with plants and fungi. The multiple 
phenotypic characteristics and growth habitats of N. 
punctiforme are indicative of die breadth of genetic 
information that is likely to be present in its genome. 
Completion of the shotgun sequencing phase of the 
N. punctiforme genome also indicates an exceptionally 
large microbial genome, approaching 10 Mb. Herein, 
we will summarize the genetic potential detectable 
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in the currently unfinished genome of M punctiforme 
that encompasses nearly all characteristics that define 
cyanobacteria as a group. 

The cyanobacteria, or oxyphotobacteria (formerly 
called blue-green algae), are a very ancient group of 
micro-organisms, with a morphological fossil record 
extending to approximately 3.0-3.5 billion years ago 
(Ga) and a chemical record to approximately 2.8 Ga 
(Des Marais 2000). All cyanobacteria are uniquely 
characterized as oxygen evolving photoautotrophic 
prokaryoles that employ chlorophyll a as the photo- 
chemically active pigment. Ancestors to extant cy- 
anobacteria profoundly changed the biosphere in two 
ways: First, the photosynthetic production of oxygen 
slowly saturated the reactive chemicals in the immedi- 
ate aquatic habitats (e.g. oxidation of iron to establish 
geological banded iron formations, between 3.0 and 
2.0 Ga), before bringing the atmosphere to near its cur- 
rent oxygen content (Schopf 2000). Second, the form- 
ation of endosymbiotic associations with eukaryotic 
cells led to the evolution of chloropiast-containing aJ- 
gae and terrestrial plants (Douglas 1994), an event 
that occurred between 1 .5 and 0.6 Ga, after free oxy- 
gen had become a significant selective force in the 
environment. 

N. punctiforme is classified in the cyanobacterial 
order Nostocales, defined by an unbranched filament- 
ous morphology and the ability to differentiate het- 
erocysts (Figure 1) (Castenholz and Waterbury 1989). 
Heterocysts are microoxic cells specialized for nitro- 
gen fixation in anoxic environment (Wolket al. 1994). 
The multiple phenotypic characteristics of//, puncti- 
forme are listed in Table 1 . Many of the phenotypic 
characteristics of N. punctiforme are directly relevant 
to increasing its fitness for photosynthesis. N. puncti- 
forme synthesizes the phycobiliproteins phycoerythrin 
(PE), phycocyanin (PC) and allophycocyanin (APC), 
which are assembled into a multiprotein complex, 
the phycobilisome. Except for the chlorophyll a and 
b containing oxychlorobacteria (prochlorophytes that 
phylogenetically cluster within the division cyanobac- 
teria; Hess et al. 2001), APC and PC are found in all 
cyanobacteria, while the presence of PE is variable 
with no obvious taxonomic correlation. Cyanobac- 
teria! strains containing PE may undergo a process 
of complementary chromatic adaptation (CCA) in re- 
sponse to light quality. N. punctiforme is categorized 
as having Type 11 CCA in that the synthesis of PE, 
but not PC, is controlled by the presence or absence of 
green light; this is in contrast to Type III CCA where 
synthesis of PE and PC are controlled by the reciprocal 




hormogonium filaments 



Figure 1. Photomicrograph of Nostoc punctiforme filaments show- 
ing the various developmental stntcs. Helerucysts, nkinctes and 
hormogonium filaments are indicated. The image is phase con- 
trast; vegetative cells arc typically 5-6 fxm in diameter, heterocysts 
6-10 ;im and akinctes 10-20 /tm. 



presence or absence of green and red light (Tandeau 
de Marsac 1977). CCA is of a clear advantage to 
light harvesting by cyanobacteria in competitive hab- 
itats, such as microbial mats and soils beneath forest 
canopies, where the spectral quality of the light may 
frequently change. 

Cyanobacteria that grow in full light as unshaded 
mats and on soils are subjected to high incidences of 
UV light and many respond by the synthesis of UV 
light absorbing compounds. N. punctiforme synthes- 
izes two classes of UV light absorbing compounds; 
scytonemin (Hunsucker et al. 2001) and microsporia 
like amino acids (F. Garcia-Pichel, pers. comm.). 
Although cyanobacteria are also well known for re- 
pair of UV-induced DNA damage (Levine and Thiel 
1987), die presence of UV light absorbing compounds 
increases the chances of survival and continued photo- 
synthesis in high light exposed habitats. 
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Table J. Phenotypic characteristics of Nostoc punctiforme 



1. Primarily an oxygenic pholoautorrophic, diazoirophic mode of growth 

2. Type II complementary chromatic adaptation by varying phycocrythrin content 

3. Produces UV- absorbing compounds in response to UV light 

4. Regulates acquisition of inorganic nutrients, especially nitrogen (NH4"*" > NC>3~ > No) 

5. Dark heterotrophic growth on sucrose, glucose or fructose 

6. Multiple developmental alternatives exemplifying a complex life cycle (Figure I) 

7. Buoyant and gliding motility with photo- and chemotactic responses 

8. Broad symbiotic competence with fungi and terrestrial phmts 



N. punctiforme can utilize various inorganic and 
organic nitrogen sources for growth. The inorganic 
nitrogen sources are assimilated in the hierarchical or- 
der of NHzj + > NO3" > N2. Growth on N2 requires 
the differentiation of heterocysts (see below) and het- 
erocyst differentiation is repressed by the presence 
of NO3" 01 NH4 + in N. punctiforme (Campbell and 
Meeks 1992) and essentially all heterocyst forming 
cyanobacteria (Wolk et al. 1994). Nitrate assimilation 
is repressed by the presence of NH4+ in all cyanobac- 
teria examined (Flores and Herrero 1994). Flexibility 
in utilization of various nitrogen sources for growth 
allows N. punctiforme to colonize and compete as a 
phototroph in illuminated habitats, irrespective of the 
specific nitrogen source. 

N. punctiforme vegetative cells have three develop- 
mental fates that are dependent on the environmental 
growth conditions. Heterocyst formation is one of 
the developmental alternatives. These nitrogen-fixing 
cells terminally differentiate in response to a limitation 
in combined nitrogen and appear at a frequency of 8- 
9% of the total cells in a well spaced pattern within the 
filament (Figure 1 ). Heterocysts are highly modified in 
their metabolism, maintaining the microoxic cellular 
environment essential for nitrogen fixation by, in part, 
elimination of the oxygenic pholosynthetic reactions 
and conversion to a heterotrophic metabolic mode 
with a high respiration rate (Wolk et al. 1994). Het- 
erocyst differentiation and maintenance is variously 
estimated to involve no more than 140 (Wolk 2000) 
to about 1000 (Lynn et al. 1986) genes. 

Under conditions of cellular energy limitation im- 
posed by, for example, phosphate limitation, all cells 
transiently differentiate into spores (Figure 1), called 
akinetes in cyanobacteria. These structures remain vi- 
able for hundreds of years under desiccated conditions 
prior to germination into vegetative filaments (Adams 
and Duggan 1999). Akinetes are speculated to be pro- 



genitors of heterocysts (Wolk et al. 1 994); there is little 
genetic information available on their differentiation 
or germination. 

In response to stress signals, all cells within a fila- 
ment may divide in the absence of biomass increase or 
DNA replication to transiently form motile filaments 
called hormogonia (Figure 1 ). These gliding filaments 
serve as propagules in the colonization of new portions 
of a habitat. Motile hormogonia express photo- and 
chemotactic behavior. Many homiogonia, including 
those of N, punctiforme (Rippka and Herdrnan 1992), 
possess an additional motility system: Gas vesicles 
provide buoyancy in soil or aquatic water columns. 
Hormogonia return to vegetative filaments by differen- 
tiation of heterocysts, resumption of biomass increase 
and initiation of DNA replication (Meeks 1998). 

The transition of vegetative filaments through akin- 
ete and hormogonium states is reflective of a complex 
life cycle in M punctiforme. The patterned spacing 
of heterocysts in filaments and the source-sink rela- 
tionship between vegetative cells (supplier of reduced 
carbon and sinks for reduced nitrogen) and heterocysts 
(supplier of reduced nitrogen and sinks of reduced 
carbon) reflect extensive cell-cell communication and 
suggests that M punctiforme, and related heterocyst- 
forming cyanobacteria, are also by definition multicel- 
lular organisms. 

N. punctiforme is amongst a limited number of cy- 
anobacteria that can grow in continual darkness as a 
respiratory heterotroph when supplied with sucrose, 
glucose or fructose, although the rate is less than half 
of the photoautotrophic rate (Summers et al. 1995). 
The capacity for prolonged heterotrophic growth by 
N. punctiforme correlates with induced synthesis of 
giucose-6-phosphate dehydrogenase, the initial en- 
zyme of the oxidative pentose phosphate pathway 
(OPP). The OPP, rather than glycolysis, is die primary 
route of carbon catabolism in cyanobacteria (Summers 
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et al. J 995). The heterotrophic capacity of N. punc- 
tiforme also is consistent with its ability to grow in 
symbiotic association with plants. 

N. punctiforme has broad symbiotic competence 
with fungi and plants. Strain ATCC 29133 (PCC 
73102) was isolated as a. symbiont from a coral- 
loid root of the gymnosperm cycad Macmzcimia sp. 
(Rippka and Herdman 1 992). Cultured N. punctiforme 
ATCC 29133 has been experimentally documented 
to establish a nitrogen-fixing symbiosis with the bry- 
ophyte hornwort Anihoceros punctatus (Enderlin and 
Meeks 1983) and the angiosperm Gunnera spp. (Jo- 
hansson and Bergman 1994). N. punctiforme is also 
identified as the intracellular symbiont of the unique 
mycorrhizal fungus Geosiphon pyriforme (Mollen- 
hauer et al. 1996). The physiology of symbiotic Nos- 
xoc species is profoundly changed by their interaction 
with plants: Growth and photosynthetic capabilities 
are diminished, the rate of nitrogen fixation is in- 
creased and heterotrophic metabolism in support of 
nitrogen fixation is enhanced. The plant partners have 
been shown to influence the differentiation and beha- 
vior of N. punctiforme hormogonia through molecular 
signals (Campbell and Meeks 1989; Cohen and Meeks 
1997) and are presumed to similarly affect hetero- 
cyst differentiation and metabolism within the symbi- 
osis (Campbell and Meeks 1992; Meeks 1998). The 
broad symbiotic competence of N. punctiforme and 
a corresponding lack of plant specificity for a single 
Nostoc strain imply that very diverse plants have ad- 
apted mechanisms to manipulate different heterocyst- 
forming cyanobacteria in the establishment of NH4 + 
producers for their growth. A planned functional ge- 
nomic analysis of N. punctiforme will undoubtedly 
reveal genes uniquely involved in microbe-plant inter- 
actions leading to a stable nitrogen-fixing association. 

We will present the overall characteristics and not- 
able properties of the N. punctiforme genome in this 
review, with an emphasis on its photoautotrophic and 
diazotrophic phenotype. The N. punctiforme genome 
characteristics will broadly be compared to those of 
some published bacterial genomes, to the finished gen- 
ome of the unicellular cyanobacterium SynechocystLs 
sp. strain PCC 6803 (http://www.kazusa.or.jp/cyano) 
and the unfinished genomes of the closely related 
heterocyst-forming Anabaena sp. strain PCC 7120 
(ICazusa) and the unicellular marine cyanobacteria 
Synechococcus sp. strain WH 8 102 and Prochlorococ- 
cus marinus MED4 (both at http://www.jgi.doe.gov). 




Methods 

A whole genome shotgun sequencing strategy, pion- 
eered by Fleischmann et al. (1995), was used to com- 
plete the sequence of the approximate 9.5 Mb genome 
of N. punctiforme strain ATCC 29133. DNA prepar- 
ation protocols are posted at http://www.jgi.doe.gov/ 
under 'Production protocols'. Sequencing was done 
using Molecular Dynamics MegaBACE and Applied 
Biosystems ABI Prism 3700 instruments. Shotgun se- 
quence data were assembled with PHRAP and through 
'auto-finishing' using software written by Matt Nolan 
(JGI/Lawrence Livermore National Laboratory) and 
David Gordon (University of Washington). 

Assembly of a 9125 999 bp database yielded 662 
contigs; 320 contigs reflecting greater than 8 reads 
and 8941326 bp were annotated. The three gene 
modeling programs utilized to define open reading 
frames (ORF) were Critica, Glimmer and Generation. 
The results of the three gene-callers were combined 
and a BLASTP search of the translations versus the 
total gene sequence database (NR) was conducted. 
The alignment of the N -terminus of each gene model 
versus the best NR match was used to pick a pre- 
ferred gene model and translation start point. If no 
BLAST match was returned, the longest model was 
retained. Gene models that overlapped by greater than 
10% of their length were flagged, giving preference to 
genes with a BLAST match. The revised gene/protein 
set was searched against the KEGG GENES, Pfam, 
PROS1TE, PRINTS, ProDom and COGS databases, in 
addition to BLASTP versus NR. In pair-wise similar- 
ity searches of gene-model translations, the BLAST 
E- value threshold was set to le-05. The initial Pfam 
threshold was set at le-05. Enzyme catalog refer- 
ences were obtained by parsing the BLAST versus 
total database. Putative genes were organized into 
functional categories based on KEGG categories and 
COGs hierarchies. Additional analyses are based on 
sequence comparisons to Cyanobase, Genbank and 
Swissprot. Analysis of oligonucleotide frequencies 
was performed using locally written software. 



Broad overview of the genome 

The overall numerical analysis of the N. punctiforme 
ATCC 29133 genome is given in Table 2. An 1 1.4 
x sequence coverage database of between 9.25 and 
9.5 Mb that is currently being analyzed implies that 
die annotated database in Table 2 reflects at least 




Tabic 2. Numerical properties of the genome of Nostoc punctiforme strain ATCC 29 133 



Current genome size —9500.000 bases (1 1.4 x sequencing coverage; February, 2001 ) 

Size of annotated sequence 8941 326 bases (8 x sequencing coverage; 94% of genome; June 6, 2000) 

Preliminary analyses 7432 protein encoding ORFs identified 

Relative to total database 5314 of the ORFs (7 1 % of die total) can be associated with a previously 

recognized ORF 

3328 of the recognized ORFs (45% of the iota!) encode proteins with 
known or probable known function 

1986 of the recognized ORFs (27% of the total) encode conserved 
hypothetical and hypothetical proteins with no known function 
2 1 64 of die ORFs (29% of the total) can NOT be associated with a 
previously recognized ORF 



94% of the actual genome. Thus, there are between 
7432 and less than 8000 protein encoding ORFs in 
the complete genome. This is amongst the largest 
of the microbial genomes being sequenced, includ- 
ing the developmental^ complex heterotrophic bac- 
teria Streptomyces coelicolor and Myxococcus xanthus 
and the budding yeast Saccharomyces cerevisiae. A 
plot of the frequency of N. punctiforme ORFs as 
a function of predicted molecular mass yielded an 
average protein size of 35.9 kDa. Because the gen- 
ome is currently unfinished, we have not attempted 
detailed analysis of gene location, orientation or op- 
eron structure. The annotated sequence is available 
at http://wwwjgi.doe.gov/tempweb/ JGI_rnicrobial/ 
html/nostoc/nostoc_homepage.html. 

Seventy-one percent of the N. punctiforme ORFs 
can be associated with a previously recognized ORF. 
The presence of 3328 ORFs (45%) that encode pro- 
teins with known or probable known function implies 
the likely occurrence of multiple gene families with 
core metabolic function. The observation that about 
29% of the putative ORFs in the N. punctiforme gen- 
ome have no significant similarity to sequences in the 
current database is similar to results in other microbial 
genomes, such as Pseudomonas aeruginosa (Stover et 
al. 2000). 

Broad comparisons to Synechocystis PCC 6803 and 
Anabaena PCC 7120 

Synechocystis PCC 6803 has a 3.57 Mb genome and 
contains 3215 protein-encoding ORFs; 1521 of those 
ORFs encode proteins with known or probable known 
function, while 1694 ORFs encode conserved hy- 
pothetical or hypothetical proteins. BLAST analysis 
showed that 3965 M punctiforme ORFs have signific- 



ant similarity in the Synechocystis PCC 6803 genome, 
while 2547 Synechocystis PCC 6803 ORFs are sim- 
ilar to N. punctiforme ORFs. These results imply 
that the large M punctiforme genome is not an exact 
multiple of the 2.7-fold smaller Synechocystis PCC 
6803 genome; however, the 1418 ORFs in excess of 
those of Synechocystis PCC 6803 that find similarity 
in N. punctiforme must represent multiple members 
of gene families that could have evolved from simple 
gene families in a common ancestor to both extant 
organisms. This analysis also showed that 668 of the 
Synechocystis PCC 6803 ORFs are not present in M 
punctiforme, which indicates that, like other microor- 
ganisms, Synechocystis PCC 6803 has its own unique 
genes. 

The Anabaena PCC 7 1 20 genome, at 7. 1 3 Mb with 
5610 ORFs, is less than 75% of the size of N. punc- 
tiforme. The reciprocal BLAST results indicate that 
5431 of the M punctiforme ORFs are present in Ana- 
baena PCC 7120 and 4814 (86%) of the Anabaena 
PCC 7120 ORFs are present in /V. punctiforme. Since 
617 more N. punctiforme ORFs find similarity in Ana- 
baena PCC 7120 than the reciprocal, it appears that 
A/, punctiforme contains multiple copies of some Ana- 
baena PCC 7120 ORFs. Anabaena PCC 7120, like 
Synechocystis PCC 6803, contains a similar number 
of ORFs (797) that appear unique relative to N. punc- 
tiforme. Of the 1283 ORFs of Anabaena PCC 7120 
that are unique in the database, 486 find similarity 
in the unique ORFs in the /V. punctiforme genome. 
This observation has two implications: First, these 
shared ORFs may encode proteins involved in com- 
mon phenotypic characteristics, such as heterocyst 
differentiation. Second, the actual number of unique 
ORFs in Af. punctiforme relative to the total database, 
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Table J. Functional categories of predicted known or probable known genes 



Functional category 



Number of 



Percent of 



putative genes 0 total genes b 



1. Energy metabolism (photosynthctic & respiratory 
electron transport, ATP synthesis] 98 

2. Inorganic nutrients [transport & metabolism of 

C. N, S. P, Fc, H & other ions] 204 

3. Organic carbon [transport & metabolism of 
glycogen, sucrose, hexoses, pentoses & 200 
carboxylutes] 

4. Metabolic dehydrogenases, oxidoreductases, 
monooxygenascs 1 34 

5. Amino acids [transport & metabolism, including 
cyanophycin] 272 

6. Nucleotides [transport & metabolism] 59 

7. Lipids, fatty acids, polyketides & cyclic peptides 137 
a. Coenzymes, vitamins, porphyrins & bilins 1 13 

9. Cell secretion & cheinotaxis 61 

10. Cell envelope synthesis, cell division & 
chromosome segregntion 278 

1 1 . DNA repair, replication & recombination, 

including transposases 271 

12. Transcription [RNA polymerase & 

tnuiseripi tonal regulators] 82 

13. Translation fiRNA synthetases, ribosomcs & 
initiation] 144 

14. Signal transduction mechanisms [protein kinases, 
response regulators & cAMP] 373 

15. Protein modification [including molecular 
chaperoncs, glutathione, thioreduxin & proteases] 152 

16. UnassigncU probable enzymes & structural 

proteins 750 



1.32 
2.74 
2.69 

1.80 

3.66 
0.79 
1.84 
1.52 
0.82 

3.74 

3.65 

1.10 

1.93 

5.02 

2.05 

10.09 



a The numbers of genes in each category arc provisional; many have been analyzed for functional 
sites, completeness of sequence and accuracy in the gene identity, but the numbers will reflect 
the mistakes and omissions inevitable from an automated annotation. 
b The percentage is based on the current total of 7432 ORFs. 



now including Anahaena PCC 7120, reduces to 1578 
or 23% of the genome. 

Functional categories ofN. punctiforme ORFs 

Based on automated annotation results, the 3328 ORFs 
that encode known or probable known functional pro- 
teins were organized into categories that are most 
relevant to the N. punctiforme life style (Table 3). 
The numbers of genes involved in energy metabolism, 
nucleotide metabolism, coenzymes and pyrroles, and 
protein synthesis are similar to those of Synechocystis 
PCC 6803 and the heterotrophic bacteria Bacillus sub- 
tilis (Kunst et al. 1997), Escherichia coli (Blattner 



et al. 1997) and P. aeruginosa (Stover et al. 2000). 
N. punctiforme has an unusually large number of 
genes encoding alcohol and aldehyde dehydrogenases, 
oxidoreductases and putative F420 monooxygenases. 
These genes would be categorized within energy meta- 
bolism in heterotrophic organisms, but it is unlikely 
that their role is to provide reductant for respiratory 
electron transport in N. punctiforme; thus, they were 
placed in a separate category. Their metabolic roles 
are unknown. 

Consistent with its ability to grow in the light on 
CO2 and simple salts, M punctiforme has a large 
number of genes for the acquisition and metabolism 
of inorganic nutrients. Included in this category are 
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91 genes encoding ion uptake and efflux transport 
components. N. punctiforme also has a comparat- 
ively large number of genes involved in lipid, fatty 
acid and complex polyketide synthesis, cell envel- 
ope synthesis, DNA metabolism, signal transduction 
and protein modification. The lipid, fatty acid and 
complex polyketide category contains two groups of 
genes encoding multtdomain proteins. One group of 
polyketide synthases is involved in the synthesis of 
the unique heterocyst glycol ipids. The other group is 
involved in the synthesis of cyclic peptide second- 
ary metabolites, which will be discussed later. The 
cell envelope/division category includes 74 glycosyl 
transferases and 23 ATPases involved in chromosome 
partitioning that are infrequently detected in other bac- 
teria. The DNA metabolism category includes approx- 
imately 150 putative transposases (see below); only 
Synechocystix PCC 6803 has a comparable number 
of putative transposase genes. The signal transduction 
category includes 255 genes encoding sensory histid- 
ine kinase and response regulator proteins and 55 ser- 
ine/threonine protein kinases. These genes are present 
in high numbers relative to other bacteria and will be 
discussed later. The protein modification category in- 
cludes, in addition to various chaperonins, multiple 
genes encoding various proteases, ATPases, and pro- 
teins for the synthesis and metabolism of thioredoxin 
and glutathione. 

N. punctifornie has many genes involved in amino 
acid transport (56) and metabolism (216), whose total 
is equivalent to those in B. subtilis and nearly twice, or 
more, those in P. aeruginosa and Synechocystis PCC 
6803, respectively. Based on the N. punctiforme nu- 
tritionally independent life style, such a robust amino 
acid transport capacity was unanticipated. The number 
of N. punctiforme genes involved with organic carbon 
metabolism is less than half those of B. subtilis, but 
1.5 times higher than those identified in Synehcocystis 
PCC 6803. Only sucrose, glucose and fructose, which 
can be catabolized through the oxidative pentose phos- 
phate pathway, support heterotrophic growth of N. 
punctiforme\ there is no evidence for diverse pathways 
of carbon metabolism to generate hexoses from other 
compounds or polymers. Transport of the hexoses to 
support growth appears more than adequate since N. 
punctiforme has 52 genes encoding organic carbon 
transport proteins, 75% of which appear to be for 
simple sugars. The role of the additional hexose trans- 
port genes may be for other than catabolic substrate 
supply. 



91 

N. punctiforme has distinctly fewer genes encod- 
ing transcriptional regulatory factors and cell secretion 
and chemotaxis proteins than does P. aeuwginosa 
which contains 448 and 191 ORFs in these two re- 
spective categories. Nevertheless, the presence of 13 
alternative group 2 sigma subunits of RNA poly- 
merase and 46 transcriptional regulators, in addition 
to 61 response regulators with DNA binding mo- 
tifs, implies substantial capacity for differential gene 
expression in N. punctiforme, which is consistent 
with its environmentally-dependent multiple develop- 
mental alternatives and facultatively heterotrophic and 
diazotrophic growth states. N. punctiforme lacks a fla- 
gellar apparatus that is present in the heterotrophic 
bacteria, although hormogonia are motile by a gliding 
mechanism. Thus, the 61 putative genes in the secre- 
tion and taxis category, while 3-fold less than those 
present in P. aeruginosa, may seem unusually high. N. 
punctiforfme has 3-5 copies each (21 total) of genes 
encoding homologs of the chemotaxis CheA, CheB 
and CheW signal transduction proteins, the CheD 
methyl-accepting protein and the CheR methylase, all 
which are likely to be involved in hormogonium tac- 
tic behavior. In addition, there are 1 1 genes encoding 
Sec, Sec-independent and general protein secretion 
pathways. 

In general, the genome of N. punctiforme reflects 
its complex life style. Only a few anticipated genes 
are absent in the unfinished database and they may be 
present in the completed sequence. Those genes in- 
volved in the phenotype defining energy, carbon and 
nitrogen metabolism categories will be discussed fur- 
ther in a subsequent section. Clusters of genes are 
present that were unanticipated based on the known 
N. punctiforme phenotype and there are notable multi- 
gene families; these will also be discussed later. 



The state of the genome in comparisons to other 
cyanobacteria 

A significant part of the information of a genome 
resides outside of genes. Protein binding sites may 
determine the regulation of gene transcription and the 
compact structure of die genome within the cell. The 
modification state of DNA may regulate the replica- 
tion of the genome and aid in DNA repair. Transpos- 
able elements and repeated sequences may increase 
the plasticity of the genome and contribute to rapid 
changes over evolutionary time. In each of these re- 
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Tabic 4. Base frequ^^^in cynnobiicieriul genomes 11 





N. 


Anabaena 


Synectiocystis 


P. marinus 


Synecliococats 




punctiforme 


PCC7120 


PCC 6803 


MED4 


WH8102 


Estimated size 


9.5 Mb 


7.1 Mb 


3.6 Mb 


1.7 Mb 


2.4 Mb 


Mol % GC 


41.5 


41.2 


47.7 


30.9 


58.5 


Underrepresenied 


CG 0.8 \ 


CG 0.78 


CG 0.75 


CG 0.51 


CG 0.87 


di nucleotides' 1 


TA0.8I 


TA0.84 


TA 0.75 


TA 0.79 


TA 0.43 


Overrepresented 


GC 1.24 




GG/CC 1.36 


GG/CC 1.28 


TG/CA 1.26 


di nucleoli dei*k 






AAvTT 1.32 







a Analysis uses complete sequence of Synechocystis PCC 6803 and partial sequences available for M 
mmaifonne (June 2000), Anabaena PCC 7120, R marinus MED4, and Synedwcnccus WHKI02. 
"Under- and overrepresenuuion is given as ihc nttio (termed p) of the dinucleotidc frequency expected 
from die % GC to the actual dinucJeoiide frequency (Knrlin et nl. 1997). All di nucleotides with p values 
less i nan 0.78 or greater than 1 .22 are given, as well as values for CG and TA di nucleotides. 



gards, the genome of N. punctiforme is strikingly 
different from those of most bacteria. 

Underrepresenied sequences 

One way to look for sequences of functional import- 
ance is to focus on those that are present in the genome 
significantly more or less frequently than would be ex- 
pected by chance. The relative abundances of specific 
dinucleotides have been shown to be stable over re- 
lated taxa (Karlin et al. 1 997), and it is evident that the 
most underrepresented dinucleotides in the N. punc- 
tiforme genome (CG and TA) are also those of other 
cyanobacterial genomes (Table 4). It should be noted 
that the TA dinucleotide is underrepresented in most 
eubacteriai genomes that have been examined (Karlin 
etal. 1997). 

Palindromic hexanucleotide sequences are gener- 
ally underrepresented in the genome of N. puncti- 
forme, some extremely so. The most underrepresented 
hexanucleotides are sites for restriction enzymes that 
have been found in some species of Nostoc or Ana- 
baena. This phenomenon, previously noted in the £. 
coii genome (Elhai 2001), points to a remarkable de- 
gree of DNA exchange amongst the Nostocaceae. Bias 
against palindromic sequences is seen also in tetranuc- 
leotides but not pentanucleotides. The only markedly 
underrepresented pentnucleotide is GGwCC, the re- 
cognition site for the common cyanobacterial restric- 
tion enzyme Avail. 

The foregoing is no less true for the genome of 
Anabaena PCC 7120; however, not all cyanobacteria 
have genomes biased against palindromic oligonuc- 
leotides. The genomes of P. marinus MED4 and Syn- 



echococcus WH8102 exhibit no bias for or against 
palindromic sequences. The genome of Synechocystis 
PCC 6803 lies midway in this regard between the ex- 
tremes of N, punctiforme on one hand and the marine 
cyanobacteria on the other. 

Restriction/modification systems and solitary DNA 
mcthyltransfe rases 

Given the probable effect restriction systems have had 
on the makeup of the N. punctiforme genome, it is 
not surprising to find within the genome evidence for 
the transient residence of different DNA methy trans- 
ferases. N. punctiforme currently has one known and 
one suspected Type II restriction/modification system: 
those recognizing the same sites as BglU and Acy\ 
(Table 5). The latter may be inactive. In addition, 
however, the genome has parts of five probably non- 
functional Type I restriction/modification systems and 
five TYpe II methyltransferases of unknown specificity 
and function. None of these systems show similarity 
to gene products predicted from the Anabaena PCC 
7120 genomic sequence but often very high similar- 
ity to enzymes from other bacteria. These are prime 
examples of genes evidently acquired by horizontal 
transfer, preserved because they either conferred tem- 
porary selective advantage on their host (Bickle and 
Kruger 1993) or actively maintained their parasitic 
presence (Ko bay ash i et al. 1999). 

In striking contrast to the lack of similarity 
between the restriction/modification proteins and 
known cyanobacterial proteins, four DNA methyl- 
transferases without corresponding restriction en- 
zymes (solitary methyltransferases) predicted from the 



Table 5. Probable DNA modification enzymes of N. pimctijonne 



Recognition sequence 


Anabaena 1 * 


Synechttcystis* 


Protoiype 


Most similar protein* 1 




Soli to ry nwfiiyliransferascs 










)0 -ll6 


GATC 


DmtA 


MbpA 


Dam, M.Mbo\ 


Anabaena 


GCCC 


DmtB 


S1I0729 


M.HaelU 


Anabaenn 


,0-123 


CGATCG 


DmtC 


SynMJ 


M./V11I 


Ann ba vim 


<io- 2 »" 


rCCGGy 


DmtD 




M.C//IOI 


Anabaena 


10 


Restriction/modification systems 












AGATCT 






M.BgM 


Bacillus 


, 0 -l(M 








R.Bf>l}\ 


Bacillus 


2.I0- |(M 


GrCGyC 0 






MAcyl 


Hemophilia 


3.I0- 24 










Herpetosiphtm 





"Proicin (or predicted protein) from indicated cyanobncleriuin with strong similarity to proposed N. punctiforme modification 

enzyme. 

h Organism with protein (or predicted protein) most similar to proposed N. punctiforme modification enzyme. BLAST E score 
is shown. 

c The proposed restriction enzyme may be inactive, owing to stopeodon near 5' end of region similar to con*esponding restriction 
enzymes. 
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A', punctiforme genomic sequence are nearly identical 
to those found in Anabaena PCC 7120 (Table 5; 
Matveyev et al. 2001). Three of these (DmtA, DmtB 
and DmtC) appear to be widely distributed amongst 
cyanobacteria. The genome of M punctiforme is thus 
almost certainly highly methylated, at sites com- 
mon to most cyanobacteria (GATC, GGCC, CG- 
ATCG), another site common perhaps to the Nos- 
tocaceae (rCCGGy; methylated by DmtD, Matveyev 
et al. 2001), and sites peculiar to itself (AGATCG, 
GrCGyC, and perhaps several others from degrading 
Type I restriction/modification systems). 

Multiple occurrences of HIP J sequences 

By far, the most frequently occurring oligonucleotide 
is the octaniicleotide sequence GCGATCGC, called 
HIP I (Robinson et al. 2000). The sequence is wide- 
spread amongst cyanobacteria, but its function re- 
mains unknown. H1P1 occurs with a frequency of once 
every 1 200 bp in the N. punctiforme genome, or about 
once every 800 bp if one counts frequently occurring 
sequences with at least 6 matches to the consensus 
sequence (Table 6). The distribution of HIP1 sites is 
nearly random in the genome, except for d istances less 
than 40 bp, indicating that the selective pressure favor- 
ing HIP! sites does not operate when a site already 
exists in close proximity. Although the function of 
HIP1 sites in cyanobacteria remains a mystery, it is 
interesting that internal to the sites are two methyl- 
transfera.se targets (CGATCG and GATC) common to 



most, if not all, those cyanobacteria that carry HLPJ 
sites. 

H1PJ sequences are found with a comparable fre- 
quency in Anabaena PCC 71 20 and somewhat higher 
frequency in Synechocystis PCC 6803 but, surpris- 
ingly, almost not at all in the marine strains (Table 6). 
The consensus sequence in Synechocystis PCC 6803 
is more relaxed in internal positions than in N. punc- 
tiforme and Anabaena PCC 7120, but it exhibits a 
strong preference for a flanking 5' G and 3' C not seen 
in the HIP I sequences of the filamentous cyanobac- 
teria. 

Contiguous repealed sequences 

Contiguous repeated sequences have been discovered 
serendipitously in some Anabaena PCC 7120 se- 
quences (e.g. Mazel et al. 1990; Holland and Wolk 
1990), but their prevalence in the genome of 
punctiforme and Anabaena PCC 7120 is astonish- 
ing. Such sequences, where the repeated region is 
at least 20 bp, account for about 1.5% of the total 
DNA within N. punctiforme (approximately 7.5% of 
the total DNA in intergenic sequences). Many con- 
tiguously repeated sequences are found at multiple 
sites in the genome. The most frequently encountered 
sequences are shown in Table 7. All the most fre- 
quent unit sequences are heptamerx, and they tend 
to fall into families, indicating that proteins recog- 
nizing them may have degenerate specificities. The 
sequences common in N. punctiforme overlap consid- 
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erably with those found in Anabaena PCC 7120, but 
their frequencies are quite different (Table 7). Two 
of three previously named heptameric short randemly 
n^peated repetitive sequences, STRRi and STRR2 
(Mazel et al. 1990), occur frequently in the N. punc- 
tiforme genome, but STRR3 does not appear at all, 
nor does the 37-bp repeat (LTRR) noted in Anabaena 
PCC 7120 (Masepohl et al. (996). We have named 
four repeated sequences STRR4, STRR5, STRR6, and 
STRR7, one of which (STRR5; Angeloni and Potts 
J 994) has been previously noted in a strain of Nostoc. 

The heterocyst-forming cyanobacteria stand apart 
from other cyanobacteria in the prevalence and type 
of contiguous repeated sequences. Synechocystis PCC 
6803 and Synechococcus WH8 102 have few repeated 
sequences except those based on triplets or their mul- 
tiples. These presumably are repeated codons. P. 
marinus has many nontriplet repeats, but almost all are 
extremely AT-rich and may merely reflect the presence 
of AT-rich regions in the genome. E. call is simil- 
arly lacking in contiguous repeated sequences, and it 
has been suggested that prokaryotes generally have 
few such repeats (Field and Wills 1998). The excess 
of heptameric repeats thus appears to be unique to 
the heterocyst-forming cyanobacteria. No clue has yet 
been provided as to their functions. 

Insertion sequences 

The genomic sequence of N. punciiforme in its current 
state canies well over 150 ORFs that are significantly 
similar, in whole or in part, to transposases. It is dif- 
ficult to arrive at an accurate count, in part because 
contigs often end within transposases, owing to the 
difficulty of assembling these ends. Furthermore, the 
transposases exist in multiple states of degradation. 
It is clearly evident, however, that the genome has 
suffered waves of infection by foreign sequences that 
insert themselves multiple times in the genome and 
progressively mutate to nonfunctionality and eventual 
oblivion. Early on, copies of the gene encoding the 
transposase suffer mutation, but the insertion sequence 
of which it is a part continues to transpose, using 
transposase encoded by sister sequences. Once the 
last transposase gene is rendered nonfunctional, the 
insertion sequence is dead, a target of mutation and 
insertion by other insertion sequences. All stages in 
this progression of events are evident in the genomic 
sequence. The functionality of an insertion sequence 
can be discerned from the sequence only by the pres- 
ence of multiple copies of identical sequences or 
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Table 7. Most frequently occurring contiguously repeated sequences in N. punctiforme 



Rank 11 


Unit sequence b 


Sites in N. 
punciifonne c 


Related sites in N. 
punctiforme^ 


Sites in Anabaena 
PCC7!20 C 


Rank 3 


Sites in 
genome 


Avg # of 
iterations 


sites in 
genome 


Avg # of 
iterations 


Sites in 
genome 


Avg # of 
iterations 




AATGACIi (STRR2) 
















1 


AATCACA 


69 


5.0 


242 


5.0 


10 


4.7 


35 


2 


AATGACT 


63 


5.1 


237 


4.9 


33 


5.8 


8 


5 


AATGACC 


39 


5.0 


236 


4.9 


14 


4.2 


26 




AATTCCC (STRR4) 
















4 


AATTCCC 


41 


4.7 


190 


4.7 


5 


5.4 


75 


7 


AATGCCC 


37 


4.5 


179 


4.8 


0 




- 


3 


AATTACG (STRR5) 


45 


4.2 


98 


4.3 


8 


4.5 


43 




AcJTCCCC (STRRI) 
















6 


ATTCCCC 


39 


4.7 


180 


4.6 


8 


5.0 




21 


AATCCCC 


19 


5.1 


183 


4.7 


58 


6.1 


2 


28 


AGTCCCC 


15 


4.7 


9.7 


4.6 


54 


5.3 


3 


8 


AGCAGGGG (STRR6) 


29 


4.4 


50 


4.2 


5 


3.R 


78 


30 


AAAATTC (STTR7) 


J3 


3.7 


128 


4.1 


75 


4.1 





"Rank of unit sequence ordered by the number of sites in the genome of N. punctiforme. The eight highest ranked unit sequences 
are shown, plus the three highest ranked of Anabaena PCC 7 120. 

^Predominant unit sequence of contiguous repeat. The unit may be considered to be nny of the possible circular permutations of 
the sequence and its inverse. The permutation appearing lowest in the alphabet was preferred to facilitate matching. STRRI and 
STRR2 are according to Mozel el al. (1990). 

c Siies in the genome where there appears a contiguous repeat with the given unit sequence predominating. 

d Sitcs in the genome where there appears a contiguous repeat where the predominating unit sequence is no more than one base 
removed from the given unit sequence. 



nearly identical sequences with identical termini. By 
this conservative criterion, N. punctiforme has close 
to six active insertion sequences, each with multiple 
insertions. 

Almost all of the transposases found encoded in the 
N. punctiforme genome are most similar to proteins 
reported in other cyanobacteria (often Synechocystis 
PCC 6803), and of these, roughly two-thirds are most 
similar to proteins from Anabaena PCC 7120. How- 
ever, there are exceptions. For example, one predicted 
TV. punctiforme protein is most similar to a Tn2/-like 
transposase from Bacillus anthracis y with a BLAST 
score of less than JO" 200 and no significant similarity 
to any protein in Anabaena PCC 7120 or Synecho- 
cystis PCC 6803. Neither R marinus MED4 or Syn- 
echococcus WH8102 have any open reading frames 
recognizable as transposases or insertion sequence 
proteins. 



Genetic information specific to energy, carbon and 
nitrogen metabolism 

Energy and carbon metabolism 

Photosynthetic electron transport 
In M punctiforme, as in other cyanobacteria, the 
genes encoding proteins of reaction center complexes, 
the cytochrome b^lf complex and electron carriers 
between the complexes are not clustered in the gen- 
ome, although there appears to be some operon struc- 
ture. This is in contrast to large gene clusters in the 
purple, non-sulfur, anoxy genie photosynthetic bac- 
teria, such as Rhodobacter sphaeroicles (Naylor et al. 
1999). Except for psbA (32 kDa or Dl protein) and 
psbD (34 kDa or D2 protein), there are single, cop- 
ies of genes encoding PS2 reaction center proteins. 
There are four complete and three truncated copies 
of psbA and one complete and one truncated copy 
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of psbD. The truncated copies appeal* at the ends of 
contigs and may be found to be complete genes after 
the sequencing of the genome is complete. These two 
genes are well known members of multigene famil- 
ies in cyanobacteria and differential transcription of 
psbA and psbD in response to varying light intensit- 
ies has been documented (Golden 1994). The genes 
encoding PS I reaction center proteins occur as single 
copies with psaA and pxaB collocated in the genome 
in a putative operon. Two additional truncated copies 
of psaB also appear at the ends of their respective 
contigs. Genes encoding proteins of the cytochrome 
btJf complex appear as pairs in putative operons with 
the following linkages: petB (cytochrome bYpetD 
(subunit IV), petC (Reiske Fe/S protein)-/^//* (cyto- 
chrome /), and petE (plastocyanin)-/?e/7 (cytochrome 
c-553); there is a second solitary copy of petJ. In 
the presence of sufficient copper, cyanobacteria syn- 
thesize plastocyanin, but when starved for copper, 
cytochrome c553 is synthesized as the soluble elec- 
tron carrier between the cytochrome b&/f and PS J 
complexes (Morand et al. 1 994). There is no evidence 
for linkage of the gene pairs. N. punctifonne con- 
tains a single gene (petH) encoding ferredoxin-NADP 
oxidoreductase and 15 putative ferredoxin genes that 
include 9 encoding 2Fe-2S petF type and 6 encoding 
a4Fe-4S type. 

Pigment synthesis 

In cyanobacteria, tetrapyrrole biosynthesis is initi- 
ated with glutamyl-tRNA as the substrate for 5- 
aminolevulinic acid synthesis (Beale 1999). There are 
four genes encoding glutamyl-tRNA synthetase and 
single genes encoding glutamyl-tRNA reductase and 
glutamate J-semialdehydeaminostransferase in the//. 
punctifonne genome, consistent with this biosynthetic 
pathway. It is of interest that genes encoding oxygen- 
dependent and oxygen-independent copropoiphyri no- 
gen III oxidases are also present in the N. punctifonne 
genome, indicating the potential for porphyrin syn- 
thesis in both oxic and anoxic environments. N. punc- 
tifonne contains genes encoding proteins involved in 
light-dependent regulation of PE synthesis, based on 
similarity to genes identified in Fremyella diplosiphon 
(also known as Calothrix sp. strain PCC 7601) (J. 
Cobley, pers. comm.). 

N.punctiforme exhibits Type II CCA, but the ques- 
tion remains whether it also has the potential for Type 
III CCA. T^pe in CCA is associated widi two or more 
clusters of epe genes encoding PC. Unfortunately, the 
sequence of the epe genes is incomplete in the cur- 




rent N. punctifonne database. The cpcCDEFGH genes 
encoding linker proteins are present in single copies, 
but of the genes encoding the apoproteins for chromo- 
phore binding, the cpcB gene is missing and only a 
fragment of cpcA at the end of a contig can be un- 
equivocally identified. Therefore, we cannot predict 
the number of cpcBA operons. Two clusters of cpeBA 
genes encoding the PE apoprotein subunits can be de- 
tected, but this may be an assembly error. Insertions 
into cpeBA result in a N. punctifonne mutant lacking 
detectable PE, implying a single functional copy (F.C. 
Wong, E.L. Campbell and J.C. Meeks, unpublished). 
It is not clear why automated cloning and sequencing 
of genes involved in phycobiliprotein synthesis has 
proven difficult in N. punctifonne. Type III CCA is 
also characterized by a signal transduction pathway 
involving RcaE, RcaF and RcaC (Bhaya et al. 2000). 
ORFs with regions of similarity to the corresponding 
structural genes are present in the N. punctifonne gen- 
ome, but the domain organization is often inconsistent. 
Thus, (he presence of genes associated with Type III 
CCA is unresolved. 

Respiratory electron transport 

In cyanobacteria, the cytochrome b(Jf complex is 
shared by both respiratory and photosynthetic elec- 
tron transport systems (Schmetterer 1994). Genes are 
present that are consistent with synthesis of both the 
multiprotein complex mitochondrial type- 1 and the I- 
2-subunit bacterial type-2 FAD-containing NADH de- 
hydrogenases (NADH: plastoquinol oxidoreductase). 
A type-1 ndhCKJ contiguous cluster can be detec- 
ted, similar to other cyanobacteria (Schmetterer 1994), 
but the other genes are unlinked. Four gene copies 
encoding the large subunit type-2 dehydrogenase are 
present. There appear to be at least 4 and possibly 6 
copies of a ctaCDE operon encoding subunit proteins 
II, I and III, respectively, for an */«3-typc cytochrome 
c oxidase; ctaD is truncated 4 times, twice in each 
orientation, at the ends of contigs. 

Carbon metabolism - assimilation 
Carbon dioxide assimilation and hexosc catabol- 
ism occur through the reductive (Calvin-Benson- 
Bassham pathway) and oxidative pentose phosphate 
pathways in cyanobacteria (Smith 1982). The en- 
zymes specific for the reductive pathway are phos- 
phoribulokinase (PRK) and ribulose 1 ,5-bisphosphate 
carboxylase/oxygenase (rubisco). The genes encod- 
ing both enzymes are present in single copies. The 
rubisco genes are clustered with the following tran- 
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scriptional orientation: rbcL-rhcX-rbcS-orfHl -orfH2- 
orfH3-iva, where L is the rubisco large subunit, X is 
a protein with ambiguous function, S is the rubisco 
small subunit, rca is rubisco activase and oifM-3 
encode hypothetical proteins with similarity in the 
Synechocystis PCC 6803 genome. A putative rubisco 
transcriptional regulator (rZ>c/? or cbbR) is located near 
a cluster of 7 genes encoding proteins involved with 
a carbon dioxide concentrating mechanism (Ccm) 
in the following orientation: ccmK3-ccmK2-ccmL- 
ccmM-ccmN-fpg-ccmKl -ccmML-rbcR, where fpg is a 
putative formamidopyrimidine-DNA glycosylase and 
ccmML is found between ccmM and ccmL in Pro- 
teobaceria. Genes encoding ccmK4 and ccmKS are 
located elsewhere. N. punctiforme has at least 5 copies 
of carbonic anhydrase, an essential Ccm enzyme, loc- 
ated in solitary positions throughout the genome. It is 
not clear if a specific bicarbonate transport system is 
present as part of the Ccm because BLAST result can- 
didate ORFs show a high degree of similarity to both 
identified bicarbonate and nitrate/nitrite transporters. 
Mutation of N. punctiforme and phenotypic charac- 
terization will be required to unequivocally establish 
specificity. 

Carbon metabolism - catabolism 
The enzymes specific for the oxidative pentose 
phosphate pathway are glucose-6-phosphate dehyd- 
rogenase (G6PD) and 6-phosphogluconate dehydro- 
genase (6PGD). The N. punctiforme genome has three 
copies each of genes encoding these proteins and this 
redundancy is thus far unprecedented in cyanobac- 
teria. One copy of the gene encoding G6PD (zwf) lies 
within an operon (opc; Summers et al. 1995) along 
with three other gene in the following orientation: 
fbp-tal-zwfopcA. The gene fbp encodes fructose 1,6- 
bisphosphatase, tal encodes transaldolase and opcA 
encodes a protein allosteric effector of G6PD (Hagen 
and Meeks 2001). The organization of these genes is 
the same in Anabaena PCC 7 1 20, but they are organ- 
ized differently in other cyanobacteria. Synechococcus 
WH8I02 and P. minor MED4 have an apparent zwf 
opcA operon with the other genes localized elsewhere. 
In Synechococcus elogatus PCC 7942, tal is located 
apart from the cluster (Newman et al. 1995) and in 
Synechocystis PCC 6803 the four genes are all un- 
linked in the chromosome. In N. punctiforme, one grid 
encoding 6PGD is located elsewhere, while the other 
two copies are collocated with the two additional cop- 
ies of zwf. The two additional copies of G6PD share 
67% similarity to each other, but only 49-53% simil- 
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arity to the opc operon zw/-encoded G6PD. The two 
6PGD proteins encoded by genes linked to zwf show 
78% similarity to each other and about 45% simil- 
arity to the solitary 6GPD. Expression studies have 
not been done to identify the gncl that is predomin- 
antly transcribed.. Since inactivation of zvv/in the opc 
operon yields defects in nitrogen fixation and dark het- 
erotrophic growth of N. punctiforme, the role of the 
additional copies of zwf and gnd is unclear. 

Nitrogen assimilation 
Nitnogenase genes 

Nitrogen fixation is mediated by the nitrogenase en- 
zyme complex that requires on the order of 20 gene 
products for synthesis and assembly (Dean and Jac- 
obson 1992). The structural genes for dinitrogenase 
(nifD and nifK) and dinitrogenase reductase (nifH) 
are well conserved among all bacteria, including 
cyanobacteria, where these genes form an operon. 
The organization and order of a large cluster of 
nif and /j/f-related genes in cyanobacteria is highly 
conserved. (Thiel et al. 1997, 1998) In N. punc- 
tiforme, Anabaena PCC 7120, A. variabilis ATCC 
29413 and probably Synechococcus RF-1, the gene 
order is nifB-fdxN-nifS-nifU-nifH-nifD-nifK-o&^^^ 
nifN-nifX-otf-otfnifW-hesA-hesB-fdxH (Buikema and 
Haselkorn 1993; Thiel et al. 3997, 1998; Huang et al. 
1999). 

The M punctiforme nifD gene has a 24-kb excision 
element near the 3' end. The nifD gene of Anabaena 
PCC 7120 is interrupted in exactly the same location 
by an 11-kb excision element (Golden et al. 1985) 
and the nifD gene in A. variabilis also has an 1 l-kb 
excision element (Brusca et al. 1 989). The I 1 -kb and 
the 24-kb elements share a highly conserved excisase 
gene (xisA) located at the beginning of the element 
that is required for excision of the element during het- 
erocyst differentiation, and a small ORF of unknown 
function. Otherwise, there is no similarity between the 
two nifD elements. The N. punctiforme element con- 
tains homology of two genes of unknown function in 
Synechocystis PCC 6803 and a homolog of a gene of 
Anabaena sp. called protein X (Sato 1 994). The other 
14 ORFs in the 24-kb element have no similarity to cy- 
anobacterial genes; however, one ORF has similarity 
to bacterial reverse transcriptase genes (ret)^ support- 
ing the suggestion that such elements may be remnants 
of lysogenic phage (Ramaswamy et al. 1997). 

The nif region of N. punctiforme differs in sev- 
eral other respects from that of Anabaena PCC 7120. 
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N. punctiforme lacks the 55-kb excision element in 
the fdxN gene that is present in Anabaena PCC 7120 
(Golden et al. 1988) Upstream of nifH in N. com- 
mune (Potts et al. 1992) and in N. punctiforme there 
is a hemoglobin-like gene called cyanoglobin (glblsf) 
whose function is not known. Phylogenetic analysis 
of the nifH and nifD genes of N. punctiforme indicates 
that they are most closely related to their homologs 
in N. commune (T. Thiel, unpublished). About 10 kb 
upstream of nifB in N. punctiforme are homologs of 
nifP, nifZ and niJT. Just downstream of the major nif 
cluster are genes for an uptake hydrogenase, includ- 
ing hupS and hupL. In Anabaena PCC 7 120, the hupL 
gene is distant from the nif 'region and is interrupted by 
a 10.5 kb excision clement (Carrascoet al. 1995). This 
element is absent in hupL ofN. punctiforme, as well as 
several otherstrains of cyanobacteria(Tamagnini et al. 
2000). 

The nif genes described above encode the 
molybdenum-dependentnitrogenase that functions ex- 
clusively in heterocysts. A. variabilis and a few closely 
related cyanobacterial strains (but not Anabaena PCC 
7120) have two alternative nkrogenases (Thiel 1998; 
Thiel and Pratte 2001). One is a vanadium-dependent 
nitrogenase that functions only in the absence of Mo 
under conditions in which heterocysts form (Thiel 
1996). The other is a Mo-dependent nitrogenase that 
functions in vegetative cells under strictly anoxic con- 
ditions (Thie! et al. 1997). N. punctiforme has only 
one complete set of nif genes and those appear to 
encode the heterocyst specific Mo-nitrogenase. N. 
punctiforme has two additional copies of nifH and one 
additional copy of nifE and nifN. One of the copies of 
nifH in immediately upstream of nifE and n(/7V forming 
a contiguous cluster of three genes. The third copy of 
nifH has no other nif genes nearby. It is not unusual 
for bacteria to have multiple copies of nifH: Anabaena 
PCC 7120 has two genes and A. variabilis has four. 
Phylogenetic analysis indicates that the second copy 
of nifH in N. punctiforme (near the second copy of 
nifEN) clusters with theTV. punctiforme and N. com- 
mune nifH genes that are part of the major /zi/ cluster; 
however it is less like those two genes than they are 
like each other (T. Thiel, unpublished). The third nifH 
\aN. punctiforme is closely related to the nifH gene in 
A. variabilis that appears to encode the dinitrogenase 
reductase of the V-nitrogenase(T. Thiel, unpublished). 
None of the nifH genes in N. punctiforme is closely 
related to the second copy of nifH in Anabaena PCC 
7120. 



Nitrate and nitrate utilization 

In Anabaena PCC 7120 the genes involved in the up- 
take and reduction of nitrate and nitrate comprise a 
cluster: nirA (nitrite Teductast)-nrtA-nrtB-nrtC-nrtD 
(ABC transporter)-™//-/? (nitrate reductase) (Frfas et 
al. 1997; Cai and Wolk 1997). In M punctiforme, 
the nirA and narB genes are separated by a single 
gene that appears to encode a permease that trans- 
ports nitrate/nitrite. This permease is similar to the 
nitrate/nitrite transporter gene, nrtP, identified in the 
marine strain, Synechococcus sp. strain PCC 7002, 
(Sakamoto et al. 1999), and to another gene in that 
same family (napA) found in marine strains Trichode- 
smium sp. and Synechococcus sp. strain WH7803 
(Wang et al. 2000). A gene for this type of per- 
mease is not present in the genomes of AnaJiaena PCC 
7120, Synechocystis PCC 6803, or R marinus MED4 
(the latter of which cannot grow on nitrate or ni- 
trite). The presence of the permease in N. punctiforme 
suggests that this type of nitrate/nitrite transporter is 
not necessarily associated with marine strains of cy- 
anobacteria. In addition to the nrtPlnapA permease in 
M punctiforme, there is a cluster of 4 genes on 3 short 
contigs that together comprise an ABC transporter 
with over 90% amino acid similarity to the nrtABCD 
nitrate transporter of Anabaena PCC 7 120. Thus, it ap- 
pears likely that N. punctiforme has two independent 
nitrate/nitrite transport systems. 

Ammonium assimilation 

Three genes putatively encoding distinct ammonium 
transport systems are present in the N. punctiforme 
genome. The enzymes glutamine synthetase (GS) and 
glutamate synthase (GOGAT) are essential for the 
assimilation of exogenous and nitrate- or dinitrogen- 
derived ammonium in cyanobacteria (Flores and Her- 
rero 1994). There are two genes in N. punctiforme with 
similarity to GS: one shows 76% and over 90% amino 
acid identity to glnA of Synechocystis PCC 6803 and 
Anabaena PCC 7120, respectively. The other gene is 
weakly similar to glnA, but has greater similarity (46% 
identity) to tdnQ in Pseudomonas putida. The idnQ 
gene encodes a putative amino group transferase that 
that is thought to be involved in the pathway for con- 
version of aniline to catechol in P. putida (Fukumori 
and Saint 1997). A ferredoxin-dependent GOGAT 
with over 90% amino acid similarity to GOGAT in 
Anabaena PCC 7120 is present in N. punctiforme. 
There are no genes for a second NADH depend- 
ent GOGAT in either N. punctiforme or Anabaena 
PCC 7120 (Martin-Figueroa et al. 2000) as has been 
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described for Plectonema (Okuhara et al. 1 999). In ad- 
dition toGS/GOGAT, ammonium could be assimilated 
directly into giutamate via glutamate dehydrogenase 
(gdhA). GdhA of N. punctifonne shows moderate 
amino acid similarity (63%) to GdhA of Synechocystis 
PCC 6803. 

Genes involved in heterocyst differentiation 
Genes identified in Anabaena PCC 7120 as import- 
ant for heierocyst differentiation are also present in 
N. punctiforme. Many of these genes encode proteins 
with over 90% amino acid sequence identity between 
the two strains. Among this group are ntcA (a reg- 
ulatory protein that is essential for many aspects of 
nitrogen metabolism) (Frfas et al. 1994), hanA (en- 
coding HU. a histone-like protein) (Khudyakov and 
Wolk 1996), devH (a putative DNA-binding protein) 
(Hebbar and Curtis 2000), hetR (a protease that regu- 
lates an early step in heterocyst differentiation) (Zhou 
et al. 1998), patB (affects pattern formation) (Liang et 
al. 1993), devR (heterocyst maturation) (Campbell et 
al. 1996) and devBCA (an ABC transporter required 
for heterocyst envelope formation protein) (Fiedler et 
al. 1998). There is a second copy of devA (90% amino 
acid identity), but not devBC. 

Several genes encode proteins with about 60—70% 
amino acid identity (70-85% similarity) between the 
strains. These include hetF (a positive regulator of 
heterocyst differentiation) (Wong and Meeks 2001), 
three genes involved in heterocyst envelope synthesis, 
hepA (Holland and Wolk 1990), hepK (a sensor his- 
tidine kinase of a two-component regulatory system) 
(Zhu et al. 1998) and hglK (Black et al. 1995), as 
well as patA (affects pattern formation) (Liang et al. 
1992), hetM (polyketide synthase) and hetl (unknown 
function). hetM, hetN and hetl are part of a contigu- 
ous gene cluster in Anabaena PCC 7120 (Black and 
Wolk 1994). While the homologs of hetM and hetl 
are present in a cluster in N. punctiforme, the most 
similar homolog of hetN (a ketoacyl reductase) is not 
in this cluster, but is present on a different contig. 
This gene, encoding a protein with 70% amino acid 
similarity to hetN, does not have homologs of hetM 
or hetl nearby. Interestingly, however, there is a gene 
with similarity to a ketoacyl reductase from Myco- 
bacterium tuberculosis (54% amino acid similarity) 
located between hetM and hetl in N. punctiforme. An- 
other polyketide synthase with about 50% amino acid 
similarity to hetM is present on a different contig. 

A few genes involved in heterocyst differentiation 
show relatively weak similarity to their counter parts in 



Anabuena PCC 7 1 20 including hepC (required for het- 
erocyst envelope synthesis) (Zhu et al. 1 998), and hetP 
and hetC (similar to ABC protein exporters and re- 
quired early in heterocyst differentiation) (Khudyakov 
and Wolk 1997). The latter two genes are contiguous 
in Anabaena PCC 7120 but not in N. punctiforme. 
The hetP homologs in the two strains have about 70% 
amino acid similarity; however, there are gaps in the 
alignment. There are two copies of hetC-Wke genes in 
N. punctiforme with different sequences but both are 
about 66% similar (amino acids) to hetC in Anabaena 
PCC 7120. One possible explanation for the weak 
similarity of some of these genes between the two 
strains is diat they originally served different functions 
in the two strains and evolved to take on the functions 
required for heterocyst differentiation. 

The patS gene that is thought to produce a small 
peptide inhibitor of heterocyst formation (Yoon and 
Golden 1 998) encodes an ORF of only 13 amino acids 
in N. punctiforme. In Anabaena PCC 7120, patS en- 
codes two possible ORFs of 13 or 17 amino acids. 
Mutation of either or both met codons in Anabaena 
PCC 7120 did not prevent normal heterocyst sup- 
pression; however, the mutant patS genes were under 
control of the very strong glnA promoter on a plasmid, 
perhaps allowing expression from a cryptic start site 
(J. Golden, pers. comm.). The presence of only the 13 
amino acid ORF in N. punctiforme and the good ribo- 
some binding site just upstream (which is also present 
in Anabaena PCC 7120) indicates that die precursor 
peptide is likely 13 amino acids. 

Other notable characteristics of the N. punctifonne 
genome 

Desiccation response 

Many terrestrial cyanobacteria show a resistance to 
water deficit. One mechanism that contributes to de- 
siccation tolerance is the synthesis of non-reducing 
disaccharides such as trehalose and sucrose. N. punc- 
tiforme lacks a homolog of trehalose-6-phosphate 
synthase (otsA). Only plants and cyanobacteria can 
synthesize sucrose and their suerose-6-phosphate 
synthases (SpsA) are highly conserved. Homology 
searches revealed at least 20 proteins that show sim- 
ilarity to sucrose-6-phosphate synthase, and a spsA 
homolog has tentatively been identified (L. Curutti 
and G. Salerno, pers. comm.). A spsA homolog 
is present in desiccation-tolerant Nostoc commune 
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DRH1. N. punctiforme also cormTin genes putatively 
encoding invertase and the bifunctional sucrose syn- 
thase, which is consistent with a capacity to synthesize 
and degrade sucrose. Synechocystis PCC 6803, which 
has the capacity to withstand air-drying, contains 
otsA and ggpS (involved in synthesis of the com- 
patible solute glucosyJ-glycerol), spsA and sucrose 
6-phosphate phosphohydrolase (spp). Synechocystis 
PCC 6803 also contains four other genes involved 
in glucosyl-glycerol metabolism; only one of these 
(slr0530) has a homolog in N. punctiforme. The gen- 
ome of M punctiforme also lacks any homolog of the 
water stress protein gene wsp which is specific to the 
form species M commune (Wright et al. 2001). 

Other genes important in desiccated cells include 
those for superoxide dismutase, catalase and DNA 
repair enzymes. N. punctiforme contains three sotlA- 
like genes as well as a homolog of N. commune sodF 
that represents the third most abundant protein in the 
latter strain. Catalase (katG) in Synechocystsis PCC 
6803 shares strong homology with counterparts in 
a range of other bacterial species and a gene with 
weak similarity is present N. punctiforme. N. puncti- 
forme contains several examples of hydrophilins such 
as a HSP70-class of molecular chaperone (rod-shape 
protein) potentially involved in cell-wall biogenesis. 
Hydrophilins are characterized by high glycine con- 
tent (>6%) and a high hydrophilicity index (>1.0). 
The criterion that defines hydrophilins seems to be an 
excellent predictor of responsiveness to hyperosmosis 
(Garay-Arroyo et al. 2000). Thus, consistent with its 
phenotype of tolerating slow drying, N. punctiforme 
may not have the capacity to tolerate rapidly altern- 
ating desiccation cycles that is characteristic of many 
terrestrial cyanobacteria. 



Cuvadian rhythms 

Cyanobacteria are now known to exhibit circadian 
rhythms (Golden et al. 1 998). A gene cluster encoding 
proteins denoted KaiA, KaiB and KaiC is essential for 
the rhythm (Ishiura et al. 1998); KaiC has similarity 
to the RecA superfamily ATPases. A complex sensor 
histidine kinase (CikA) with an attached chromophore 
is a component of the environmental sensing pathway 
involved in entrainment (Schmitz et al. 2000). The N. 
punctifoime genome contains homologs of kaiABC in 
a cluster, plus cikA elsewhere, although there is no 
physiological evidence of a circadian rhythm. 



Multigene families 

The multigene families of N. punctiforme can be or- 
ganized into 5 broad groupings: environmental sense 
and response, transcriptional regulation, transport, 
transposition (previously discussed), and those that 
encode large complex multidomain proteins. 

Environmental sense and response 
Environmental sensing and response in bacteria oc- 
curs primarily through protein histidine-aspartate 
phosphorelay systems generally referred to as two- 
component regulatory systems (Hoch and Silhavy 
1995). The signal sensing and transmitting component 
consists of a sensor histidine kinase (or transmit- 
ter) which autophosphorylates an invariant histidine 
residue in an ATP-dependent mechanism in response 
to an environmental signal. The phosphorylated trans- 
mitter transfers the phosphate to an invariant aspartate 
residue in a cognate receiver protein called a response 
regulator (or receiver). The response regulators most 
often have an output domain that defines a DNA- 
binding motif through which the protein regulates 
transcription, although some have no output domain. 
There also exists a class of complex signal transducers 
that contains both transmitter and receiver domains. 

N. punctiforme has an unusually high number 
(255) of combined two-component signal transduc- 
tion proteins. By comparison, Synechocystis PCC 
6803 has 42 transmitters and 38 response regulat- 
ors and B. suhtilis has 38 transmitters and 34 re- 
sponse regulators, for totals of 80 and 72, respect- 
ively. E. coli contains genes encoding 23 simple 
transmitter proteins, 32 response regulator proteins 
and 5 complex transmitter-response regulator pro- 
teins; a total of 62. In £. coli, the cognate sensor 
kinase-response regulator for a specific environmental 
signal also tend to lie contiguously on the chromo- 
some. The vast majority (88%) of the 153 N. punt- 
tiforme transmitter genes are unlinked to a response 
regulator. Single domain sensor histidine kinases 
constitute only about 53% of this class of genes, 
while genes encoding complex proteins consisting of 
transmitter-receiver, transmittcr-receiver-trarismitter, 
and transmitter-receiver-receiver-transmitter (and 
other combinations) constitute the remaining 47%. 
The 102 response regulator genes consist of 36% 
encoding receivers with no apparent output domain, 
while the remainder are characterized by a helix-turn- 
helix DNA binding output domain. The unusually high 
frequency of response regulators lacking output do- 
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mains may reflect extensive operation of multiprotein 
phosphorelay signaling systems, similar to those that 
control sporulation in B. subtilis (Ireton et al. 1993). 
The advantage of extended phosphorelay systems is 
integration of multiple environmental signals in the 
signaling pathway. 

Included in the above analyses were the simple 
chemotaxis sensor histidine kinases and complex 
chromophore-binding sensor histidine kinase proteins 
with homology to CAA (rcaE) and circadian rhythm 
(cikA) responses. Additional complex chromophore- 
binding sensor histidine kinases are those that encode 
phytochrome homologs. N. punctifonne contains at 
least 6 genes encoding cyanobacterial phytochrome 
proteins (Yeh et al. 1997), 2 cphJ and 4 cph2, plus 
15 other phytochrome-like proteins; only four of the 
total phytochrome-like proteins lack obvious histidine 
kinase domains (J. C. Lagaris, pers. comm.). In addi- 
tion to simple and complex sensor histidine kinases, 
N. punctifonne also contains 55 ORFs encoding euk- 
aryotic serine/threonine protein kinases. The majority 
(62%) are single domain Ser/Thr kinsases, but 21 of 
these ORFs encode putative proteins with both Ser/Thr 
and His kinase domains. Their physiological roles re- 
main undefined. It is likely that all of these protein 
kinases constitute a phylogenetic family of sensory 
transduction proteins. 

In addition, the N. punctiforme genome contains 
at least 7 genes encoding adenylate cyclase involved 
in cAMP synthesis, perhaps in response to metabolite 
sensing. 

As judged by the number of genes apparently 
encoding signal transduction proteins, the potential ca- 
pacity of//, punctiforme to sense and respond to envir- 
onmental changes is extraordinary; more so than any 
bacterium characterized to date. The nature and extent 
of the environmental signals remains to be determ- 
ined. However, we speculate that much of the signal 
transduction capacity will be involved in the mul- 
tiple developmental alternatives that N. punctiforme 
can express in response to environmental signals. The 
requirement in heterocyst development for DevR, a 
response regulator lacking an output domain (Camp- 
bell et al. 1 996), implies that one or more multiprotein 
phosphorelay system is involved in that differentiation 
event. 

Transcription 

For quite some time, cyanobacteria were thought to 
be limited in the extent of transcriptional regulation 
in response to environmental changes. The presence 
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of response regulators with output domains and the 
documented differential gene expression in hetero- 
cyst differentiation, nitrogen assimilation and CCA 
clearly negates that long held assumption (Tandeau de 
Marsac and Houmard 1993). Nevertheless, transcrip- 
tional regulation is poorly described in cyanobacteria 
(Curtis and Martin 1994). In bacteria, transcriptional 
regulation can be dictated by promoter sequence re- 
cognition by the a subunit of DNA-dependent RNA 
polymerase and/or by the presence of transcriptional 
inhibitors or activators that are not a part of the RNA 
polymerase holoenzyme. The N. punctiforme genome 
contains 13 apparent alternative a 70 subunits in addi- 
tion to the primary a 70 subunit. Only one alternative 
cr 70 has been mutated in N. punctifonne leading to a 
discemable phenotype involving symbiotic interaction 
(Campbell et al. 1998). a 54 and the elements of the 
signal transduction pathway governing the expression 
in Proteobacteria of nitrogen responsive genes, includ- 
ing nif (Merrick and Edwards 1995), is absent in the JV. 
punctiforme genome, except for glnB encoding the Pjj 
protein. 

The M punctiforme genome contains at least 57 
genes encoding ancillary transcriptional regulatory 
proteins in addition to the putative response regulat- 
ors with output domains; 67% are classified only as 
predicted and have similarity to a variety of known 
regulatory proteins such as TetR, XylR and ArsA. 
There are 6 copies of genes with high similarity to 
those encoding AraC, 8 of LysR and 2 of MocR. 
These collective data imply that N. punctiforme has a 
substantial capacity for differential gene expression in 
response to a variety of environmental signals. 

Transport 

N. punctiforme has 262 ORFs encoding proteins that 
play an assigned role in transport of small organic 
and inorganic molecules across the cell membrane. 
There are 89 ORFs in the N. punctiforme genome 
that have been provisionally identified as encod- 
ing the ATPase domain of assigned and unassigned 
membrane-associated ATP-binding cassette transport 
systems (ABC transporters). In addition, there are 48 
organic carbon and ion transporting permeases not as- 
sociated with ABC transporters. There appear to be 
no representatives of the phosphotransferase system of 
enteric bacteria in the N. punctiforme genome. 

There are two complete ATP-dependent phosphate 
transport systems, each comprising pstS % pstC, pstA 
and pstB. In addition, there is a contiguous cluster of 
genes, probably also involved in phosphate transport, 
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with genes similar to pstC and pstA from Archaeo- 
globux, and to pstB2 from Synechocystis PCC 6803. 
Also associated with this cluster is a gene with some 
similarity to spbA (encoding the periplasmic binding 
protein for sulfate transport) and another that is similar 
to sphX in Synechocystis, a gene that is regulated by 
SphR, which responds to phosphate limitation. There 
is a single sulfate transport system with two divergent 
copies of sbpA followed by cysT and cysW. The ATP- 
binding protein of the sulfate transport system, cysA, 
is on a different contig. The putative nitrate transport 
system is described in the section on nitrate and nitrate 
utilization. Three genes have similarity to a putative 
glutamine transporter Molybdenum is probably trans- 
ported by the products of genes that are similar to 
modA (encoding the periplasmic binding protein) and 
a fused ModBC protein that combines the function of 
the permease and the ATP-binding protein. Another 
ABC transporter has similarity to putative zinc and 
manganese transporters in other bacteria. An unusual 
ABC transporter comprising four genes has about 70% 
amino acid similarity to the ptxABCD genes that are 
thought to function in phosphite transport in Pseudo- 
monas stutzeri (Mctcalf and Wolfe 1998). There are 
genes with similarity to sugar transport systems, par- 
ticularly ribose and hexose transport and to peptide 
transport systems. Although there are many other 
genes with similarity to various components of ABC 
transporters, most of these are not associated with a 
complete set of genes known to be required for trans- 
port in other bacteria and, hence, may function in 
conjunction with other transport systems. 

Multidomain proteins putatively synthesizing cyclic 
peptide toxins 

N. punctiforme contains 62 ORFs encoding proteins 
involved in the apparent synthesis of cyanobacterial 
secondary products classified as microcystins. Micro- 
cystins are hepatotoxins that inhibit eukaryotic pro- 
tein phosphatase activity; they are synthesized and 
released by a variety of unicellular and filamentous 
cyanobacteria (Dow and Swobodn 2000). Structurally, 
microcystins are hybrid cyclic peptide-polyketide mo- 
lecules of molecular mass between 820 and 1044 Da. 
Microcystins are synthesized by the sequential activ- 
ity of non-ribosomal peptide synthetases (NRPS) and 
polyketide synthases (PKS), together with chain or 
side chain modifying activities. The NRPS and PKS 
activities may be confined to respective single pro- 
teins or be collocated on multidomain proteins. The 34 
genes encoding the multidomain hydrid NRPS-PKS 




proteins in N. punctifonne were erroneously identi- 
fied by automated annotation as Acyl-CoA synthetase 
(AMP forming)/AMP-(fatty) acid ligases I. The ORFs 
range in size from 1031 to 14048 bp and 53 complex 
and simple ORFs are clustered in 2 sets of 3 genes 
and 1 set each of 4, 5, 6, 12 and 14 genes. The latter 
two sets at 46.78 and 49.22 kb constitute the largest 
common gene clusters in the M punctiforme genome 
and come the closest to the definition of gene islands. 

There is no precedence for the production of mi- 
crocystins by the terrestrial N. punctiforme, so the 
detection of these genes was unanticipated. Essentially 
all production of cyanotoxtns has been recorded in 
aquatic habitats, especially those experiencing dense 
growth blooms of cyanobacteria (Dow and S woboda 
2000). Marine cyanobacterial strains produce similar 
hybrid molecules; many are also halogenated and have 
biological activity (Sitachitta et al. 2000). The ex- 
tent of production of such compounds may be more 
extensive in cyanobacteria than has been anticipated 
by culture and habitat sampling. Similar to antibiotic 
production by fungi and Gram-positive bacteria, the 
physiological role and selective advantage for second- 
ary product production by the individual organisms is 
uncertain. 



Conclusions 

The genome of N. punctiforme has many of the char- 
acteristics one would expect of a sequence that is 
highly plastic and in a state of Mux. The genome 
has a conspicuous number of elements - insertion se- 
quences and multilocus repeats - that can participate 
in genome rearrangements, duplications and deletions. 
The inventory of transposases and DNA modification 
enzymes, and the paucity of restriction sites, indic- 
ate extensive exchange of DNA amongst cyanobac- 
teria, particularly hetcrocyst-forming cyanobacteria, 
but also significant input of DNA from distantly re- 
lated bacteria. 

In these regards, the N. punctiforme and Anahaena 
PCC 7120 genomes are similar. The Synechocystis 
PCC 6803 genome shares certain features such as a 
large number of transposable elements and HIP I se- 
quences, but not others in that there are very few 
multilocus repeats and restriction/modification sys- 
tems. All three, however, are strikingly different 
from the genomes of P. marinus MED4 and Syne- 
chococcus WH8102 which lack all of these elements. 
Perhaps these marine cyanobacteria are not often ex- 
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posed to foreign DNA and thus have not modified their 
genomes to exploit new genetic opportunities. 

The phenotype of N. punctiforme is complex, more 
so than most other cyanobacteria, and essentially all 
genes are present that might be required to support 
the multifaceted phenotype. The great breadth of tools 
available to N. punctiforme to sense and respond to 
environmental signals was not anticipated. Moreover, 
the identification of putative toxin synthetic genes, 
those for circadian rhythms, alternative environmental 
sources of phosphorous and sulfate, osmoregulation 
via glycine betaine uptake and Ccm indicate that 
nearly all characteristic that are collectively repres- 
ented in cyanobacteria as a group are present in N. 
punctiforme. Apparently absent are genes involved in 
sulfide oxidation to support PS2-independent linear 
photosynthetic electron transport, analogous to that 
in anoxygenic green sulfur bacteria (present in a few 
cyanobacteria such as Oscillatoria limnetica; Oren 
2000), rapid response to desiccation and rewetting 
(Pentecost and Whitton 2000), and perhaps Type HI 
CCA. 

Clearly, genes must be present that determine two 
of the most interesting phenotypic characteristics of 
N. punctiforme, cell differentiation and symbiotic in- 
teraction. The best understood examples of bacterial 
differentiation, such as sporulation by Bacillus and 
Myxococcus and swarmer and stalk cell formation 
by Caulobacter, provide a wealth of genes specific 
for these behaviors. However, ORFs in the N. punc- 
tiforme genome show no more similarity to these 
than to genes from nondifferentiating bacteria. Sim- 
ilarly, genes from Rhizobia known to be involved in 
their interaction with legumes have not been useful 
in identifying genes required for the interaction of 
N. punctiforme with plants. No doubt, the regulatory 
mechanisms governing complex bacterial behaviors 
evolved multiple times, drawing on the pool of sig- 
nal transduction protein kinases and other regulatory 
proteins that are so abundant in N. punctiforme. 

It will thus be necessary to rely on the traditionally 
genetic approaches of mutation and phenotypic char- 
acterization to define those genes necessary for the dif- 
ferentiation of akinetes, heterocysts and hormogonia, 
and for the ability to enter into symbiotic associations. 
While many of these genes may have already been 
recognized as encoding regulatory proteins, many oth- 
ers may lie amongst the 2164 hypothetical ORFs thus 
far unique to N. punctifonne and particularly amongst 
the 486 hypothetical genes shared by N. punctiforme 
and Anabaena PCC 7120. Within the latter group is 



a gene whose product positively regulates heterocyst 
differentiation (Wong and Meeks 2001), and two other 
genes whose products are important in establishing 
the pattern of heterocyst spacing (RC. Wong and J.C. 
Meeks, unpublished). 

The ability to manipulate the genome of N. punc- 
tiforme by sequence specific recombination-directed 
mutation and transposon mutagenesis (Cohen et ah 
1998; Hagen and Meeks 1999) will prove invaluable 
in identifying the function of genes involved in the 
complex behaviors exhibited by this organism. The 
availability of the genomic sequence, moreover, of- 
fers the possibility of using global gene expression 
methodologies to identify genes transcribed under par- 
ticular conditions. Genes identified in this way and 
fused to easily assayed reporters (Cohen and Meeks 
1997; Wong and Meeks 2001) may permit the elucid- 
ation of one of the biggest prizes N. punctiforme has 
to ofFer: the mechanisms by which cyanobacteria and 
plants communicate with each other to yield a stable 
nitrogen fixing symbiosis. 
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